North and South: Danish voters and Syrian refugees

Task 1: Get spatial data for municipalities in Denmark

You can download administrative data for Denmark from the GADM dataset, the Global administrative boundaries, hosted by UCDavis. You do this by using the getData() function in the raster package. For GADM data, you need to specify what level of admin boundaries you wish to download (0=country, 1=first level subdivision aka regions, 2=second level aka municipalities, etc.). Read this blog on the power of raster package when it comes to available datasets.

Instructions:

  • Load the boundaries of Danish municipalities from data/ folder, convert to simple feature and transform to CRS 25832.
  • Sort the NAME_2 field to see how the Danish municipalities are spelled. You may need to change them later for the spatial data to join the attributes.
# Load the spatial data, project to UTM
mun_sp<- readRDS(________) # it's the gadm_... 2.rds dataset
mun_sf <- ________(mun_sp)
mun <- st_transform(_____, crs = ____)

# Plot so as to check correct location and complete coverage


# Check the names


# Straighten the names (return here after Task 2)
Simple feature collection with 99 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
      GID_0  NAME_0   GID_1      NAME_1 NL_NAME_1     GID_2      NAME_2
36829   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.1_1 Albertslund
36665   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.2_1     Allerød
36779   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.3_1    Ballerup
37011   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.4_1    Bornholm
36901   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.5_1     Brøndby
      VARNAME_2 NL_NAME_2  TYPE_2    ENGTYPE_2 CC_2   HASC_2
36829      <NA>      <NA> Kommune Municipality <NA> DK.HS.AB
36665      <NA>      <NA> Kommune Municipality <NA> DK.HS.AL
36779      <NA>      <NA> Kommune Municipality <NA> DK.HS.BA
37011      <NA>      <NA> Kommune Municipality <NA> DK.HS.BO
36901      <NA>      <NA> Kommune Municipality <NA> DK.HS.BR
                            geometry
36829 MULTIPOLYGON (((712057 6173...
36665 MULTIPOLYGON (((700891 6191...
36779 MULTIPOLYGON (((715156 6178...
37011 MULTIPOLYGON (((878103.7 61...
36901 MULTIPOLYGON (((716929 6168...
 [ reached 'max' / getOption("max.print") -- omitted 5 rows ]

 [1] "Albertslund"       "Allerød"           "Ballerup"         
 [4] "Bornholm"          "Brøndby"           "Christiansø"      
 [7] "Dragør"            "Egedal"            "Fredensborg"      
[10] "Frederiksberg"     "Frederikssund"     "Furesø"           
[13] "Gentofte"          "Gladsaxe"          "Glostrup"         
[16] "Gribskov"          "Halsnæs"           "Helsingør"        
[19] "Herlev"            "Hillerød"          "Høje Taastrup"    
[22] "Hørsholm"          "Hvidovre"          "Ishøj"            
[25] "København"         "Lyngby-Taarbæk"    "Rødovre"          
[28] "Rudersdal"         "Tårnby"            "Vallensbæk"       
[31] "Århus"             "Favrskov"          "Hedensted"        
[34] "Herning"           "Holstebro"         "Horsens"          
[37] "Ikast-Brande"      "Lemvig"            "Norddjurs"        
[40] "Odder"             "Randers"           "Ringkøbing-Skjern"
[43] "Samsø"             "Silkeborg"         "Skanderborg"      
[46] "Skive"             "Struer"            "Syddjurs"         
[49] "Viborg"            "Aalborg"           "Brønderslev"      
[52] "Frederikshavn"     "Hjørring"          "Jammerbugt"       
[55] "Læsø"              "Mariagerfjord"     "Morsø"            
[58] "Rebild"            "Thisted"           "Vesthimmerland"   
[61] "Faxe"              "Greve"             "Guldborgsund"     
[64] "Holbæk"            "Kalundborg"        "Køge"             
[67] "Lejre"             "Lolland"           "Næstved"          
[70] "Odsherred"         "Ringsted"          "Roskilde"         
[73] "Slagelse"          "Solrød"            "Sorø"             
 [ reached getOption("max.print") -- omitted 24 entries ]

Task 2: Load voting data for 2011 and 2015 and inspect

In order to move on towards analysis, I have provided you with a summarized voting data for 5 biggest parties in 2011 and 2015 and 2019 by municipality. The columns list total votes per party, sum of the electorate and fraction that each party got in a given year.

  • Create the elections object from the data/elections.rds
  • Inspect it to ensure you understand what the columns contain
  • Join the data to the municipality shapes mun by the shared name
  • Plot the Socialdemocratie fraction across Denmark in 2015 and check whether you got all the municipalies. Fix names if not.
  • In which municipalities did the Social Democrats get the highest proportion of population in 2015?
# Load the summarized election data
elections_data <- readRDS("../data/elections.rds")

# Join the election data with municipality polygons


# Fix the missing counties 

# Map fraction of Socialdemokratie in 2015 to see no counties are missing
elections %>% 
  ________ %>% 
  filter(_________) %>%  # A.Socialdemokratie
  dplyr::select(______) %>% 
  mapview()

# Which municipalities are the biggest fans of Socialdemokratie?
Simple feature collection with 99 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049775 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 99 × 3
# Groups:   NAME_2 [99]
   NAME_2      pct_vote2015                                             geometry
   <chr>              <dbl>                                   <MULTIPOLYGON [m]>
 1 Bornholm            56.3 (((878103.7 6112929, 878088.2 6112928, 878087 61129…
 2 Morsø               55.8 (((473803.7 6286625, 473923.9 6286624, 473998 62866…
 3 Christiansø         52.4 (((892608.6 6148064, 892610.4 6148044, 892618.5 614…
 4 Aalborg             52.1 (((531664.6 6318606, 531667.6 6318604, 531732.2 631…
 5 Thisted             52.0 (((471987.6 6284534, 472001.9 6284534, 472021.7 628…
 6 Ballerup            51.7 (((715156 6178972, 714227 6179009, 713611 6178539, …
 7 Odder               51.7 (((566871.2 6191988, 566871.7 6191957, 567173 61919…
 8 Nyborg              51.1 (((613677.9 6130271, 613678.7 6130239, 613681.1 613…
 9 Skive               50.7 (((482468.4 6275480, 482528.8 6275479, 482528.9 627…
10 Lolland             50.6 (((652471.1 6058085, 652470.1 6058116, 652398.5 605…
# ℹ 89 more rows

Task 3: Look at some of the data

Now that we have a well-structured, complete and spatial dataset, let’s explore the political preference distribution in space with the help of the lovely tmap library!

  • Filter your elections data for Social Democrats and Danske Folkeparti (Hint: grepl() is a good start)
  • then feed the result into tm_shape() and tm_polygons, faceting along the way by party. Since you have 2 parties, you should get two visuals.
  • repeat three times, changing the tm_polygons() data from pct_vote2011 to pct_vote2019
# Let's map the two most popular parties, SD and Danske Folkeparti through time
library(tmap)
elections %>%
    filter(grepl("^A|^O", Party)) %>%
    tm_shape() + tm_facets("Party", ncol = 2) + tm_polygons("pct_vote2011", title = "Percentage of Votes \nin 2011")

elections %>%
    filter(grepl("^A|^O", Party)) %>%
    tm_shape() + tm_facets("Party") + tm_polygons("pct_vote2015", title = "Percentage of Votes \nin 2015")

elections %>%
    filter(grepl("^A|^O", Party)) %>%
    tm_shape() + tm_facets("Party") + tm_polygons("pct_vote2019", title = "Percentage of Votes \nin 2019")

Task 4: Cartogram

As you can see from the maps, the area of municipalities varies considerably. When mapping them, the large areas carry more visual “weight” than small areas, although just as many people or more people live in the small areas. Voters in low-density rural regions can thus visually outweigh the urban hi-density populations.

One technique for correcting for this is the cartogram. This is a controlled distortion of the regions, expanding some and contracting others, so that the area of each region is proportional to a desired quantity, such as the population. The cartogram also tries to maintain the correct geography as much as possible, by keeping regions in roughly the same place relative to each other.

The cartogram package contains functions for creating cartograms. You give it a spatial data frame and the name of a column, and you get back a similar data frame but with regions distorted so that the region area is proportional to the column value of the regions.

You’ll also use the sf package for computing the areas of newly generated regions with the st_area() function.

Instructions

The elections sf object should be already loaded in your environment.

  • Load the cartogram package.
  • Filter out the Danske Folkeparti votes from your elections dataset, creating a DF object
  • Plot total electorate over municipality area for year 2015 in the DF data. Deviation from a straight line shows the degree of misrepresentation.
  • Create a cartogram scaling to the pct_vote2015 column.
  • Check that the DF voter population is proportional to the area.
  • Plot the pct_vote2015 percentage on the cartogram. Notice how some areas have relatively shrunk or grown.
# load library
library(cartogram)

# Filter out Danske Folkeparti
DF <- elections %>%
    filter(grepl("^O", Party))
# Check the spread of votes and municipality area
plot(DF$pct_vote2015, st_area(DF, byid = TRUE), xlab = "Vote %", ylab = "Area (m2)",
    main = "Dansk Folkeparti fraction per municipality area")


# Make a cartogram, scaling the area to the percentage of SD voters
DF2015 <- cartogram_cont(DF, "pct_vote2015")

# Check the linearity of the SD voters percentage per municipality plot
plot(DF2015$pct_vote2015, st_area(DF2015, byid = TRUE))

Copacetic cartogram! Now try to rerun the cartogram for the Social Democrats in 2015 and create a visual for both parties’ turnout and total electorate in 2015.

library(cartogram)
# Let's look at Social Democrats in 2015
SD <- elections %>%
    filter(grepl("^A", Party))

# Make a cartogram, scaling the area to the total number of votes cast in 2015
SD2015 <- cartogram_cont(SD, "sum2015")

# Now check the linearity of the total voters per municipality cartogram as
# opposed to the reality
plot(SD$sum2015, st_area(SD, byid = TRUE))  # reality
plot(SD2015$sum2015, st_area(SD2015, byid = TRUE))  # cartogram


# Make a adjusted map of the 2015 SD and DF voters
plot(SD2015$geometry, col = "pink", main = "% of Social Democrat votes across DK in 2015")
plot(DF2015$geometry, col = "lightblue", main = "% of Danske Folkeparti votes across DK in 2015")

Task 5: Spatial autocorrelation test

If we look at the facetted tmaps the election results in 2015 seem to have spatial correlation - specifically the percentage of voters favoring Danske Folkeparti increases as you move towards the German border. This trend is not as visible in the cartogram, where the growth is more apparent in Sjæland, and other islands, like Samsø.

How much similarity and spatial dependence is there, really?

By similarity or positive correlation, we mean : pick any two kommunes that are neighbors - with a shared border - and their attributes will be more similar than any two random municipalities. Such autocorrelation or spatial dependence can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.

The spdep package has functions for measures of spatial autocorrelation, also known as spatial dependency. Computing these measures first requires you to work out which regions are neighbors via the poly2nb() function, short for “polygons to neighbors”. The result is an object of class nb. Then you can compute the test statistic and run a significance test on the null hypothesis of no spatial correlation. The significance test can either be done by Monte-Carlo or theoretical models.

In this example you’ll use the Moran “I” statistic to test the spatial correlation of the Danske Folkeparti voters in 2015.

Instructions I - defining neighbors

  • Load the elections spatial dataset with attributes
  • Consider simplifying the boundaries if the data is too heavy for your computer and takes long to visualise
  • Load the spdep library and create nb object of neighbors using queen adjacency
  • Pass elections to poly2nb() to find the neighbors of each municipality polygon. Assign to nb.
  • Get the center points of each municipality by passing elections to st_centroid and then to st_coordinates(). Assign to mun_centers.
  • Update the basic map of the DK municipalities by adding the connections.
    • In the second plot call pass nb and mun_centers.
    • Also pass add = TRUE to add to the existing plot rather than starting a new one.
# Reload the data if needed elections
plot(elections$geometry)

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)
length(st_is_valid(mun_sm$geometry))
[1] 99
# Use the spdep package
library(spdep)

# Make neighbor list following queen adjacency
nb <- poly2nb(mun_sm$geometry)
nb
Neighbour list object:
Number of regions: 99 
Number of nonzero links: 344 
Percentage nonzero weights: 3.509846 
Average number of links: 3.474747 
11 regions with no links:
4 6 10 43 55 57 63 68 79 84 89
16 disjoint connected subgraphs
# Get center points of each municipality
mun_centers <- st_coordinates(st_centroid(mun_sm$geometry))

# Show the connections
plot(mun_sm$geometry)
plot(nb, mun_centers, col = "red", add = TRUE)

Instructions II - Moran’s I

Now that your neighbors are determined and centroids are computed, let’s continue with the Moran’s I statistic

  • Create a subset with municipalities for O.Danske Folkeparti
  • Feed the pct_2011 vector into moran.test().
    • moran.test() needs a weighted version of the nb object which you get by calling nb2listw().
    • After you specify your neighbor nb object you should define the weights style = "W". Here, style = "W" indicates that the weights for each spatial unit are standardized to sum to 1 (this is known as row standardization). For example, municipality 1 has 3 neighbors, and each of those neighbors will have weights of 1/3. This allows for comparability between areas with different numbers of neighbors.
    • You will need another argument in both spatial weights and at the level of the test. zero.policy= TRUE deals with situations when an area has no neighbors based on your definition of neighbor (many islands in Denmark). When this happens and you don’t include zero.policy= TRUE, you’ll get an error.
    • Run the test against the theoretical distribution of Moran’s I statistic. Find the p-value. Can you reject the null hypothesis of no spatial correlation?
  • Inspect a map of pct_2011.
  • Run another Moran I statistic test, this time on Social Democrats.
    • Use 999 Monte-Carlo iterations via moran.mc().
    • The first two arguments are the same as for moran.test().
    • You also need to pass the argument nsim = 999.
    • Note the p-value. Can you reject the null hypothesis this time?
# Let's look at Danske Folkeparti in 2015
DF <- elections %>% 
  _____(____)

# Run a Moran I test test on 2015 DF vote
moran.test(DF$_________, 
           nb2listw(____, style = "W",zero.policy=TRUE),
           zero.policy=TRUE)

# Run a Moran I test test on 2011 DF vote
moran.test(DF$________, 
           nb2listw(___, style = "W",zero.policy=TRUE),
           zero.policy=TRUE)

# Do a Monte Carlo simulation to get a more reliable p-value
moran.mc(DF$_________,
         ________(____, zero.policy=TRUE),
         zero.policy=TRUE, nsim = 999)

    Moran I test under randomisation

data:  DF$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 6.6385, p-value = 1.584e-11
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.532756868      -0.011494253       0.006721352 

    Moran I test under randomisation

data:  DF$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 6.2079, p-value = 2.685e-10
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.499144047      -0.011494253       0.006766158 

    Monte-Carlo simulation of Moran I

data:  DF$pct_vote2015 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.53276, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Marvelous Moran Testing! When I ran the examples, the p-value was around 1.584e-11 in 2015 and 2.685e-10 in 2011 Moran tests, showing significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.001, so I did find some significant spatial (positive) correlation.

Repeat the same test for Social Democrats

# Let's look at Social Democrats

# Run a Moran I test on percentage of SD turnout in 2011

# Run a Moran I test on percent of SD turnout in 2015

# Do a Monte Carlo simulation to get a better p-value

# Do a Monte Carlo simulation to get a better p-value

    Moran I test under randomisation

data:  SD$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 5.6495, p-value = 8.044e-09
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.450511071      -0.011494253       0.006687554 

    Moran I test under randomisation

data:  SD$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 5.0148, p-value = 2.654e-07
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
       0.40141853       -0.01149425        0.00677959 

    Monte-Carlo simulation of Moran I

data:  SD$pct_vote2011 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.45051, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

    Monte-Carlo simulation of Moran I

data:  SD$pct_vote2015 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.40142, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Phenomenal political testing. Social Democrats also show positive correlation. p-value in Moran I test is was around 8.044e-09 in 2011 results and 2.654e-07 in 2015 results, and thus significant result. In Monte Carlo simulation, the p-value was around 0.24, suggesting there is significant (positive) spatial correlation.

Task 6: Different sorts of neighborhood: 100 to 50 km

Does the result hold if you use a different scale / neighborhood calculation?

Connect the nearest places (islands)

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)

# Use the spdep package
library(spdep)

# Get center points of each municipality
mun_centers <- st_centroid(mun_sm$geometry, of_largest_polygon = TRUE)

# Make neighbor list from neighbours at 100km distance
nb_100 <- dnearneigh(mun_centers, 0, 1e+05)
plot(mun_sm$geometry)
plot(nb_100, mun_centers, col = "red", add = TRUE)

# Make neighbor list from neighbours at 50km distance
nb_50 <- dnearneigh(mun_centers, 0, 50000)
plot(mun_sm$geometry)
plot(nb_50, mun_centers, col = "blue", add = TRUE)
title(main = "Neighbours within 50 km distance")

Task 7: Different sorts of neighbourhood: k neighbors

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)

# Use the spdep package
library(spdep)

# Get center points of each municipality
mun_centers <- st_centroid(mun_sm$geometry, of_largest_polygon = TRUE)

# Make neighbor list from k neighbours
k3 <- knearneigh(mun_centers, k = 3)
nb_k3 <- knn2nb(knearneigh(mun_centers, k = 3))
plot(mun_sm$geometry)
plot(nb_k3, mun_centers, col = "red", add = TRUE)
title(main = "3 nearest neighbours")

# Make neighbor list from k neighbours
nb_k5 <- knn2nb(knearneigh(mun_centers, k = 5))
plot(mun_sm$geometry)
plot(nb_k5, mun_centers, col = "red", add = TRUE)
title(main = "5 nearest neighbours")

Taks 8: Rerun Moran’s I and MC

Now let’s rerun Moran’s I and MC with different neighbour conceptions

# Run a Moran I test on Dansk Folkeparti votes in 2015 based on k neighbors
moran.test(DF$_____,
           ________(nb_k3, style = "W",zero.policy=TRUE),
           zero.policy=TRUE)

# Do a Monte Carlo simulation to get a better p-value
moran.mc(DF$_____,
           ________(nb_k3, s zero.policy=TRUE),
         zero.policy=TRUE, nsim = 999)

# Run a Moran I test on Dansk Folkeparti votes in 2011 based on k neighbors
moran.test(DF$_____,
           ________(nb_k3, style = "W",zero.policy=TRUE),
           zero.policy=TRUE)

# Do a Monte Carlo simulation to get a better p-value
moran.mc(DF$___________,
         ________(nb_k3, zero.policy=TRUE),
         zero.policy=TRUE, nsim = 999)